- Title
- Optimal control and policy search in dynamical systems using expectation maximization
- Creator
- Mallick, Prakash
- Relation
- University of Newcastle Research Higher Degree Thesis
- Resource Type
- thesis
- Date
- 2023
- Description
- Research Doctorate - Doctor of Philosophy (PhD)
- Description
- Trajectory optimization is a fundamental stochastic optimal control problem. In this type of control problem it is incredibly important to consider the impact of measurement noise. In particular, measurement noise plays a huge role in dynamical systems undergoing motion/action, especially in an uncertain environment. Therefore, in this thesis, I deal with a trajectory optimization approach for unknown dynamical systems subject to measurement noise. I propose an architecture which assimilates the benefits of a conventional optimal control procedure with the advantages of maximum likelihood approaches, resulting in a novel iterative trajectory optimization paradigm called Stochastic Optimal Control - Expectation Maximization. I explore the advantages of the proposed methodology in a reinforcement learning setting compared to other widely used baselines. Another class of algorithms known as Guided Policy Search approaches have been proven to work with incredible accuracy for not only controlling a complicated dynamical system, but also learning optimal policies from various unseen instances. One assumes the true nature of the states in almost all of the well-known policy search and learning algorithms. However, I utilize the stochastic optimal control approach and extend it to learning (optimal) policies when there is latency in states. This learning will have less noise because of lower variance in the optimal trajectories. The theoretical and empirical evidence from the learnt optimal policies of the new approach are depicted in comparison to some well-known baselines which are evaluated on a two-dimensional autonomous system with widely used performance metrics. Furthermore, I provide extensive empirical results for the case of a dynamical system attempting to perform three-dimensional complicated tasks as well. The trajectory optimization procedure shows that the optimal policy parameters obtained by the maximum likelihood technique produce better performance in terms of reduction of cumulative cost-to-go and less stochasticity in state and action trajectories through efficiently balancing exploration and exploitation, which is a new direction introduced in this thesis. Additionally, I provide a few novel theoretical results that bridge the gap between definitions of information theory as a result of my proposed optimization objective function.
- Subject
- stochastic optimal control; model-based reinforcement learning; guided policy search; expectation maximization
- Identifier
- http://hdl.handle.net/1959.13/1477548
- Identifier
- uon:50000
- Rights
- Copyright 2023 Prakash Mallick
- Language
- eng
- Full Text
- Hits: 644
- Visitors: 880
- Downloads: 260
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT01 | Thesis | 7 MB | Adobe Acrobat PDF | View Details Download | ||
View Details Download | ATTACHMENT02 | Abstract | 506 KB | Adobe Acrobat PDF | View Details Download |